Search CORE

28 research outputs found

Beyond Well-designed SPARQL

Author: Kaminski Mark
Kostylev Egor V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Conference on Database Theory (ICDT 2016)
Publication date: 01/01/2015
Field of study

SPARQL is the standard query language for RDF data. The distinctive feature of SPARQL is the OPTIONAL operator, which allows for partial answers when complete answers are not available due to lack of information. However, optional matching is computationally expensive - query answering is PSPACE-complete. The well-designed fragment of SPARQL achieves much better computational properties by restricting the use of optional matching - query answering becomes coNP-complete. However, well-designed SPARQL captures far from all real-life queries - in fact, only about half of the queries over DBpedia that use OPTIONAL are well-designed. In the present paper, we study queries outside of well-designed SPARQL. We introduce the class of weakly well-designed queries that subsumes well-designed queries and includes most common meaningful non-well-designed queries: our analysis shows that the new fragment captures about 99% of DBpedia queries with OPTIONAL. At the same time, query answering for weakly well-designed SPARQL remains coNP-complete, and our fragment is in a certain sense maximal for this complexity. We show that the fragment\u27s expressive power is strictly in-between well-designed and full SPARQL. Finally, we provide an intuitive normal form for weakly well-designed queries and study the complexity of containment and equivalence

Dagstuhl Research Online Publication Server

Oxford University Research Archive

CONSTRUCT Queries in SPARQL

Author: Kostylev Egor V.
Reutter Juan L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Conference on Database Theory (ICDT 2015)
Publication date: 01/01/2015
Field of study

SPARQL has become the most popular language for querying RDF datasets, the standard data model for representing information in the Web. This query language has received a good deal of attention in the last few years: two versions of W3C standards have been issued, several SPARQL query engines have been deployed, and important theoretical foundations have been laid. However, many fundamental aspects of SPARQL queries are not yet fully understood. To this end, it is crucial to understand the correspondence between SPARQL and well-developed frameworks like relational algebra or first order logic. But one of the main obstacles on the way to such understanding is the fact that the well-studied fragments of SPARQL do not produce RDF as output. In this paper we embark on the study of SPARQL CONSTRUCT queries, that is, queries which output RDF graphs. This class of queries takes rightful place in the standards and implementations, but contrary to SELECT queries, it has not yet attracted a worth-while theoretical research. Under this framework we are able to establish a strong connection between SPARQL and well-known logical and database formalisms. In particular, the fragment which does not allow for blank nodes in output templates corresponds to first order queries, its well-designed sub-fragment corresponds to positive first order queries, and the general language can be re-stated as a data exchange setting. These correspondences allow us to conclude that the general language is not composable, but the aforementioned blank-free fragments are. Finally, we enrich SPARQL with a recursion operator and establish fundamental properties of this extension

Dagstuhl Research Online Publication Server

Two Variable Logic with Ultimately Periodic Counting

Author: Benedikt Michael
Kostylev Egor V.
Tan Tony
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)
Publication date: 01/01/2020
Field of study

We consider the extension of FO² with quantifiers that state that the number of elements where a formula holds should belong to a given ultimately periodic set. We show that both satisfiability and finite satisfiability of the logic are decidable. We also show that the spectrum of any sentence is definable in Presburger arithmetic. In the process we present several refinements to the "biregular graph method". In this method, decidability issues concerning two-variable logics are reduced to questions about Presburger definability of integer vectors associated with partitioned graphs, where nodes in a partition satisfy certain constraints on their in- and out-degrees

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Oxford University Research Archive

Stratified Negation in Limit Datalog Programs

Author: Grau Bernardo Cuenca
Horrocks Ian
Kaminski Mark
Kostylev Egor V.
Motik Boris
Publication venue
Publication date: 25/04/2018
Field of study

There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are typically undecidable. In prior work, the language of

\mathit{limit\ programs}

was proposed, which is sufficiently powerful to capture many analysis tasks and has decidable entailment problem. Rules in this language, however, do not allow for negation. In this paper, we study an extension of limit programs with stratified negation-as-failure. We show that the additional expressive power makes reasoning computationally more demanding, and provide tight data complexity bounds. We also identify a fragment with tractable data complexity and sufficient expressivity to capture many relevant tasks.Comment: 14 pages; full version of a paper accepted at IJCAI-1

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Foundations of Declarative Data Analysis Using Limit Datalog Programs

Author: Grau Bernardo Cuenca
Horrocks Ian
Kaminski Mark
Kostylev Egor V.
Motik Boris
Publication venue
Publication date: 01/01/2017
Field of study

Motivated by applications in declarative data analysis, we study

\mathit{Datalog}_{\mathbb{Z}}

---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In

\mathit{limit}~\mathit{Datalog}_{\mathbb{Z}}

predicates are axiomatised to keep minimal/maximal numeric values, allowing us to show that fact entailment is coNExpTime-complete in combined, and coNP-complete in data complexity. Moreover, an additional

\mathit{stability}

requirement causes the complexity to drop to ExpTime and PTime, respectively. Finally, we show that stable

\mathit{Datalog}_{\mathbb{Z}}

can express many useful data analysis tasks, and so our results provide a sound foundation for the development of advanced information systems.Comment: 23 pages; full version of a paper accepted at IJCAI-17; v2 fixes some typos and improves the acknowledgment

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Classification of annotation semirings over containment of conjunctive queries

Author: Kostylev Egor V.
Reutter Juan L.
Salamon András Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/03/2022
Field of study

Funding: This work is supported under SOCIAM: The Theory and Practice of Social Machines, a project funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/J017728/1. This work was also supported by FET-Open Project FoX, grant agreement 233599; EPSRC grants EP/F028288/1, G049165 and J015377; and the Laboratory for Foundations of Computer Science.We study the problem of query containment of conjunctive queries over annotated databases. Annotations are typically attached to tuples and represent metadata, such as probability, multiplicity, comments, or provenance. It is usually assumed that annotations are drawn from a commutative semiring. Such databases pose new challenges in query optimization, since many related fundamental tasks, such as query containment, have to be reconsidered in the presence of propagation of annotations. We axiomatize several classes of semirings for each of which containment of conjunctive queries is equivalent to existence of a particular type of homomorphism. For each of these types, we also specify all semirings for which existence of a corresponding homomorphism is a sufficient (or necessary) condition for the containment. We develop new decision procedures for containment for some semirings which are not in any of these classes. This generalizes and systematizes previous approaches.PostprintPeer reviewe

St Andrews Research Repository

The Bag Semantics of Ontology-Based Data Access

Author: Grau Bernardo Cuenca
Horrocks Ian
Kaminski Mark
Konstantinidis George
Kostylev Egor V.
Nikolaou Charalampos
Publication venue
Publication date: 01/01/2017
Field of study

Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags

arXiv.org e-Print Archive

Crossref

Southampton (e-Prints Soton)

Oxford University Research Archive

On the Correspondence Between Monotonic Max-Sum GNNs and Datalog

Author: Cucala David Tena
Grau Bernardo Cuenca
Kostylev Egor V.
Motik Boris
Publication venue
Publication date: 29/05/2023
Field of study

Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs

arXiv.org e-Print Archive